Generate n-Grams (Characters) (Text Processing)
Synopsis
Creates character n-Grams of each token in a document.Description
This operator creates all possible n-Grams of each token in a document. A character n-Gram is defined as a series of characters of length n. The n-Grams of a token generated by this operator consist of all series of characters of this token which have length n. If a token is shorter than the specified length n, the token itself is kept in the resulting document.
Input
- document
The document port.
Output
- document
The document port.
Parameters
- lengthThe length n of the n-grams. Range:
- keep_termsIndicates if the original terms (i.e. tokens) should be kept along with the created n-grams. Range: